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Abstract  (Unclassified) 

Many  commercial  Geographic  Information  Systems  have  a  dual  architecture:  tbit  thematic  infor-/ 
mation  is  stored  in  a  relational  database  management  system  and  the  spatial  information  is  ston 
in  a  separate  subsystem  capable  of  dealing  with  spatial  data  and  spatial  queries.  Besides  not  l^g 
elegant  conceptually,  this  dual  architecture  also  reduces  the  performance,  because  objects  hrtve  to 
be  retrieved  and  compiled  from  components  that  may  be  stored  far  apart  in  me  two  subsystems. 
We  present  a  solution  based  on  the  extendable  database  management  system  ’‘Postgres,’*  in  which 
thematic  and  spatial  data  are  stored  together. 


The  contents  of  this  report  was  presented  at  the  Second  European  Conference  on  Geogrqthical 
Irtformation  Systems,  EGIS  '91,  Brussels,  Belgium,  April  2-5,  1991.  This  contribution  was 
classified  by  the  EGIS'91  Program  Committee  to  belong  to  the  five  best  papers  out  of  ISO  pqters 
selected  from  325  proposals. 
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1  Introduction 


Many  of  today's  infonnation  systems  are  built  around  a  relational  database  (RDBMS).  The 
geogr^>hic  nature  of  the  data  in  Geognq^c  Infonnation  Systems  (GISs)  results  in  a  number  of 
problems  for  an  RDBMS.  It  is  impossilde  to  state  queries  like;  “Which  dries  with  a  population 
giealer  than  1(X),(X)0  lie  within  10  kilometers  from  the  river  Rhine"  in  the  data  manipulation 
language  of  RDBMS.  Even  if  it  were  possible  to  state  this  type  of  queries,  then  they  could  not 
be  answered  efBdently,  because  an  RDBMS  lacks  the  pn^r  multi-dimensional  index  stnictures. 
Therefore,  the  storage  system  often  consists  of  two  major  subsystems.  We  will  call  this  the  dual 
archiuciure.  The  fost  subsystem  is  based  on  an  RDBMS  and  contains  the  alphanumeric  thematic 
data.  The  second  subsystem  is  a  spedai  purpose  storage  system,  capable  of  dealing  with  spatial 
data.  An  object,  that  has  both  a  thematic  artd  a  spatial  component,  has  parts  in  both  subsystems 
which  ate  linked  by  common  identifier.  In  order  to  retrieve  one  object,  two  subsystems  have  to  be 
queried  and  the  answer  has  to  be  composed  Many  commerdal  GISs  have  chosen  this  inefBdeni 
and  inelegant  dual  ardiitectute. 

We  investigate  a  data  model  that  does  not  have  this  dual  architedure.  We  have  diosen  the 
relational  dau  model  as  our  starting  poirtt,  because  it  is  a  powerful  and  well-known  model.  The 
required  extension  to  the  relational  data  modd  is  based  the  incorporation  of  [30]:  geogr^riiic  data 
types  (e.g..  iine,  polygon),  spatial  (iterators  (e.g.,  distance,  overitq>,  nearest-ndghbor),  and  spatial 
index  structures  (e.g.,  R-tree  [12],  KD2B-tree  [29]).  However,  most  DBMSs  are  closed,  i.e.,  it  is 
impossible  to  add  new  data  types,  t^ierttois,  or  ittdex  structures.  An  exception  is  Postgres  [23], 
an  experimental  DBMS  developed  at  the  Iftiiversity  of  Qdifomia,  Berkeley.  are  developing  a 
prototype  GIS  based  on  Postgres. 

Chapter  2  states  the  functionality  that  a  GIS  demands  from  a  DBMS.  Besides  the  functional 
aspects,  this  paper  also  pays  attention  to  perfotmatKe  aspects  of  a  solution.  Our  prototype  GIS, 
called  the  GEO  system,  is  implemented  on  a  Sun  3/60  wcnkstatimi.  Before  descril^  this  system 
in  Chapter  4,  a  short  introduction  to  rite  most  important  aqtects  of  Postgres  is  given  in  Chapter  3. 
This  paper  is  ctmcluded  in  QuqrterS  with  some  final  remarks  and  suggestions  fi>r  future  research 
and  developmetH. 
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2  DBMS  Capabilities  required  by  GIS  applications 


GISs  lequiie  moie  c4>abiUties  from  a  DBMS  than  the  tiaditional  business  data  processing  systems. 
This  is  also  tnm  for  other  spatial  infoimation  systems,  such  as  CAD  systems.  The  mathemmically 
sound  relational  data  mode!  [4]  is  the  stalling  point  for  the  following  discussion.  This  chapter 
enumerates  the  most  impoitant  requirements  of  a  DBMS  that  is  used  by  a  GIS.  The  DBMS  must; 


rl  Store  alt  the  data  in  one  storage  system,  that  is,  both  the  spatial  data  and  the  non-spatial  (or 
thematic)  data.  This  avoids  the  diawbadts  of  a  dual  architecture. 

r2  Support  that  tuples  of  the  same  relation  may  have  different  sizes.  This  is  a  very  common 
situation  in  GISs  as  the  number  of  points  per  polyline  or  polygon  is  not  fixed. 

r3  Represent  Complex  objects.  Various  geogr^thic  objects  consist  of  multiple  components. 
For  example,  the  representation  of  a  city  may  consist  of  a  collection  of  roads,  buildings,  and 
various  other  terrain  elements.  These  must  be  quidcly  accessible. 

r4  Enable  the  exchange  complete  geographic  data  sets  (including  the  possible  relationships 
between  the  differem  individual  entities)  bmween  various  GIS  sites. 

rS  Suppon  (or  be  extendable  to  support)  both  raster  and  vector  data.  The  vector  data  can  be 
subdivided  in  the  following  three  geometric  data  types:  point,  polyline,  and  polygon.  There 
must  be  two  and  three  dimensional  variants  of  these  types;  see  Section  2.1. 

r6  Supportthefollowingthreecategoriesofjiparfa/flperarors;  geometric calculatim operators, 
topological  operators,  and  spatial  comparison  operators;  see  Section  2.2. 

rl  Provide  spadal  index  structures.  As  GISs  usually  deal  with  large  collections  of  geogrqrhic 
data,  a  spatial  index  structure  is  indispensable;  see  Section  2.3. 

r8  Provide  big  tuple  attributes,  because  polylines,  for  example,  may  contain  diousands  of 
segmertts. 

r9  Allow  the  implementation  of  an  odwvicedgrqpWciuer-frtrnyace  through  a  direct  interface 
with  a  hi^  level  programming  language,  such  aa  C 


These  requiremertts  are  not  independei*  of  eadi  odn;  For  example,  the  fact  that  all  data  types 
have  to  be  atored  in  OIK  storage  system  (rf).  causes  most  of  the  othmtequiremems.  ReqniiemeMs 
r5,  rd,  and  r7  are  explamed  in  more  detail  in  foe  next  three  sectiona. 
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[a]  create  tower  (name  s  charl6,  owner  =  charl6,  location  =  point2) 

[b]  create  road  (name  s  charlS,  shape  =  polyline2) 

[c]  create  town  (name  s  charl6,  #inhabitants  =  int4,  shape  =  polygon2) 

Figuur  2.1:  Relations  with  geogn4>hic  attributes 

2.1  Geographic  Data  T^pes 

Neatly  all  geognphic  data  processing  [ 1 6. 25, 26]  is  peifoimed  with  vector,  raster,  or  a  ctHnbination 
of  these  geometric  data  formats.  The  vector  foimat  has  three  subtypes:  point,  polyline  and 
polygon.  The  emphcris  is  on  the  vector  rqtresentation,  because  it  allows  more  flexible  object- 
oriented  manipulations,  though  the  raster  representation  also  has  advantages;  see  [27].  Although 
their  representations  mi^t  be  complex,  these  data  types  must  be  regarded  as  atomic  values  in  the 
data  model. 

Figure  2.1  demonstrates  the  use  of  geographic  attributes  in  relations  using  the  Postquel  query 
language;  see  Section  3.2.  The  “2”  behind  point,  polyline,  and  polygon  indicates  that  these  are 
two  dimensional  attributes.  Similarly,  it  is  possible  to  have  three  dimensional  variants  of  these 
data  types;  a  “3”  behind  the  type. 


2.2  Spatial  Operators 

This  chapter  describes  the  basic  ^atial  operators.  More  complex  operators  exist,  but  do  not 
have  to  be  itKiuded  in  the  basic  CIS  database  system.  For  example,  not  included  are:  network 
calculations,  advanced  visualization  techniques,  simulation,  and  complex  geometric  ca-^culations. 
such  as  calculating  the  Vironoi  digram,  the  convex  hull,  orthe  smallest  enclosing  circle.  Although 
polygon-overlay  is  a  ccmiplex  operation,  it  is  used  in  many  CIS  applications,  and  it  must  be  added 
to  the  set  of  su^ard  operators.  The  polygon-overlay  takes  two  sets  of  polygons  and  calculates 
all  mtersections,  which  results  in  a  third  set  of  polygons. 

Many  ^atial  operators  or  functions  have  been  described  by  various  authors  [2, 11. 13, 14, 19].  IKfe 
do  not  claim  that  our  lists  of  operators  are  complete,  but  they  should  give  a  good  impression  of  the 
basic  spatial  openton.  We  distii^uish  three  fundameruai  dasses  of  ^atial  operators  in  addition 
to  the  more  standard  operator  dasses  (e.g..  comparison,  logical,  statistical,  and  set  operators)  in 
«  RDBMS: 

1.  Geomerrfc  oo/cvlliirioii  qperasors  return  a  scalar  value  or  a  geometric  value.  Some  of  the 
most  importam  operators  are;  distance,  length,  perimeter,  area,  closest, 
intersection,  and  union. 


i4 

I 


2.  ToptAogkai  openton  return  a  geometric  value.  Some  examples:  nd^ibots,  next  IMc  (hi 
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a  polyline  network),  left  and  right  polygons  of  a  polyline,  start  and  end  nodes  of  polylines. 
Ws  do  not  need  an  extra  set  of  topological  operators,  because  the  topological  model  can  be 
ci|)tuted  in  a  natural  manner  in  the  standard  relational  model;  see  [30]. 

3.  Spatial  comparison  operators  f^tn  a  Boolean:  true  or  false.  Although  the  calcula¬ 
tions  ate  similar  to  the  previous  classes  of  operators,  drey  form  a  separate  group.  Some 
examples  ate:  intersects,  inside,  larger.than,  outside,  north.of,  on,  and 
neighbor.of.  All  comparison  curators  have  two  operands,  whidi  may  be,  in  most 
cases,  of  any  geometric  type.  Note  that  it  is  often  possible  to  emulate  these  operators  by 
ctwrbining  the  standard  comparison  operators  with  the  geometric  or  topological  operators. 


As  stated  above,  there  can  be  other  qratial  operators  that  ate  useful  for  a  specific  applicatitm.  But  it 
is  impossible  to  include  them  all  in  this  paper.  An  “open”  database  is  the  solution  for  the  dilemma 
which  operators  do  belong  and  which  operatois  do  not  belong  to  the  system.  If  an  operator  is 
not  available  in  the  set  of  basic  spatial  operators,  it  can  be  implemented  by  the  user  and,  after  it 
has  been  certified,  added  to  the  database  system.  In  this  way  other  users  also  benefit  from  the 
new  capabilities.  Note  that  organizational  actions  have  to  be  takoi.  For  example,  someone  must 
ensure  the  clear  and  unique  naming  of  operators.  It  is  possible  to  formulate  queries  using  the  basic 
spatial  operators  as  will  be  demonstrated  in  Chapter  3. 


2.3  Spatial  Indexing  Techniques 

The  B-ttee  [  1  ],  an  indexing  technique  used  in  many  DBMSs,  combines  several  desirable  properties. 
It  is  a  dynamic,  height  balartced  structure,  i.e.,  insertions,  deletions,  and  updates  of  entries  may  be 
alternated  with  searches.  Because  of  the  balanced  nature,  searches  are  efficient  (0(log  n)  time). 
Also,  the  nodes  in  the  B-ttee  ate  at  least  half  filled.  This  results  in  a  compact  structure. 

However,  the  B-tree  is  only  suited  for  searching  based  on  one  dimensional  attributes,  such  as 
numbers  or  strings.  Multiple  indices  on  mote  than  one  attribute  of  a  relation  ate  possible,  but 
(with  current  implementations  of  RDBMSs)  only  one  can  be  used  for  solving  a  query  like: 

ratriav*  (towar.all) 

whara  5  <  towar. location. x  <  10 
and  12  <  towar. location. y  <  20. 


These  poittt  queries  can  be  solved  effideikly  by  the  XDB-frre  [  1 8]  index  strocture.  The  KDB-tree 
is  a  KD-(tee  adapted  fiM’  secondary  storage  and  csn  handle  pdta  data  in  any  dimension.  The 
KD-tree  cannot  handle  the  ptdyiine  and  polygon  data  types.  In  the  literatnie  there  ate  several 
solutions  fi>r  this  problem,  such  as;  the  R-tree  [S]  (see  Section  3.4),  die  Fidd-tree  [6, 7],  die  Cdl 
tree  [10],  the  KD2B-ttee.  the  Sphere-tree  [29],  and  die  Reactivemee  [28].  When  fine-tuning  die 
application,  the  proper  indexing  technique  has  to  be  selected. 
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3  Description  of  Postgres 


This  chapter  gives  an  introducticm  to  the  i:q>en  DBMS  Postgres.  A  more  detailed  functional 
description  can  be  found  in  [22]  and  several  implementation  dedsions  are  discussed  in  [23].  The 
Postgres  reference  manual  [33]  contains  all  the  information  required  to  use  the  system.  Postgres, 
the  successor  of  Ingres,  is  a  leseardi  project  directed  by  Michael  Stonebrdcer  at  the  University 
of  California,  Berkeley.  The  characteristic  new  concepts  in  Postgres  are:  support  for  complex 
objects,  iitheritanoe,  user  extenddulity  (with  new  data  types,  operators  and  access  methods), 
versions  of  relations,  and  support  for  rtiles.  The  later  may  be  used  to  implementation  constraints. 

In  the  next  sections  we  explain  the  features  that  are  of  interest  to  GISs.  Wt  will  illustrate  these 
features  with  some  CIS  examples  based  on  the  current  version  of  Postgres  (version  2.0.3).  First, 
Section  3.1  describes  the  global  architecture  of  Postgres.  The  next  section  describes  the  query 
language  Postquel  and  Section  3.3  gives  an  example  with  a  user  defined  type.  In  Section  3.4  the 
spatial  access  method  that  is  available  within  Postgres,  the  R-tiee,  is  described. 


3.1  The  Architecture  of  Postgres 


Postgres  can  be  viewed  as  a  collection  of  files  and  processes  that  operate  on  these  files.  The  files 
contain  the  relations  and  data  required  for  the  access  methods,  that  is,  the  B-tree  or  the  R-tiee  itself. 
A  daemon  process  postmaster  handles  the  conununication  between  the  backend  (the  process  that 
does  the  re^  DBMS  work  and  is  therefore  called  Postgres)  and  Omfrontend  or  applicatioa  The 
postmaster  starts  a  backend  process  for  each  application  that  requests  the  services  of  Postgres.  A 
standard  Postgres  application  is  the  monitor,  an  alphanumeric  user-interface  for  Postgres.  The  user 
may  state  Postquel  queries  and  the  answers  are  diqrlayed  in  a  tabular  format  New  qrplications 
can  be  develop^  ba^  on  Postgres  by  using  the  C  library  functions  of /ffipq.  This  library  contains 
functions  to  pass  the  queries  to  the  badtend  and  to  irderpiet  the  bufiers,  called  portals,  whidi  are 
used  to  return  the  results.  Another  way  of  interacting  with  Postgres  is  by  using  the  fast  path. 
The  fast  path  makes  it  possible  to  call  Postgres  system  functions  iBiectly.  In  tlus  way  the  query 
language  is  bypassed  arid  best  performance  is  adiieved  by  calling  the  access  methods. 


3.2  The  Query  Language  Postquel 


The  query  language  Posiquei  is  based  on  three  concepts: 

•  There  are  three  kiiids  of  dma  typer,  base  types  (built-in.  system,  and  uaei),  amy  types 
(fixed  and  variable  length)  and  coraposile  types  (tuple,  set  of  tuples,  and  rdatkm). 
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[a]  retrieve  min (distance (tower. location,  * (10, 15) * : :point2) ) 

[b]  retrieve  closest (tower. location,  * (10, 15) * : :point2) ) 

[c]  retrieve  (tower. name) 

where  inside ( (retrieve  (town.po.tygon) 

where  town. name  =  ‘Amsterdam*),  tower . location) 

Figuur  3.1;  Post()uel  queries  using  geographic  functions 

•  The  following  kinds  of functions  are  available:  nonnal  functions  (C  or  Postquel),  operators 
(binding  of  a  symbol  to  a  fimction),  aggregate  functions  (count,  sum,  average,  min,  max, 
etc.),  and  inheritable  functions. 

s  Rules  have  the  form:  “on  condition  then  do  action”  and  they  are  used  to  trigger  DBMS 
actions.  Section  4.4  gives  an  example. 

User  defined  types,  with  their  own  functions  and  operators,  are  of  particular  interest,  because 
these  may  be  used  to  defirre  the  geographic  data  types.  Section  3.3  describes  the  user  types  in 
more  detail.  The  database  administrator  may  “upgrade”  user  types  to  system  types,  making  them 
available  to  each  data  base  created  on  the  system. 

The  current  distribution  of  FOstgres  already  contains  a  system  type  example  that  ^>plT>ximates 
a  two  dimensional  variant  of  the  extension  with  geographic  data  types  we  proposed.  It  conrists 
of  the  four  types:  point ,  Iseg,  path,  and  box.  The  type  Iseg  implements  a  single  liiK 
segment  Polylines  and  polygons  may  be  represented  by  path,  which  is  a  variable  length  array 
of  Iseg.  The  special  case  of  a  two  dimensional  axes-parallel  rectangle  is  represented  by  the 
type  box.  Some  useful  functions  and  operators  are  provided  (test  for  overiaiping  boxes,  test  if  a 
point  lies  inside  a  box,  the  distance  between  two  points),  but  more  are  required  for  a  really  good 
geographic  extension  of  the  DBMS;  see  Section  22. 

Some  practical  CIS  example  queries  show  how  geographic  functions  might  be  used  in  Postquel; 
see  Figure  3.1.  (}uery  [a]  is  a  “minimum  distance"  query.  This  will  not  work  in  Postgres 
version  2,  because  the  aggregate  function  min  is  not  yet  implemented.  The  next  query  [b]  does 
the  same  thing,  but  is  formulated  more  efficiently  by  using  the  function  closest.  The  last 
example,  query  [c],  uses  the  inside  function  in  order  to  retrieve  all  the  munes  of  the  towers  in 
Arosieidam. 


3  J  Defining  a  User  l^pe 

The  example  in  this  section  defines  the  new  user  type  efrefe.  This  example  is  based  on  the  tutorial 
distributed  with  Postgres.  It  is  impmtam  to  reaiiae  that,  in  Postgres,  there  is  a  difference  between 
the  internal  and  the  external  icpresentation  of  a  type.  The  external  representation  is  a  character 
strir^foruseriripmandoutpotasuaedinthemoniloi;  in  the  case  of  a  dide  this  could  be:  (oetMerjt, 
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#include  <stdio.h> 

typedef  struct  {  double  x,  y;  )  POINT; 

typedef  struct  {  POINT  center;  double  radius;  )  CIRCLE; 

/*  The  internal  representation  */ 

CIRCLE  *circle_in(str) 

/*  Convert  from  external  to  internal  representation  */ 
char  *str; 

( 

/*  Allocate  new  CIRCLE,  parse  string,  return  result.  */ 

) 

char  ♦circle_out (circle) 

/*  Convert  from  internal  to  external  representation.  •/ 
CIRCLE  ‘circle; 

{ 

char  ‘result; 

if  (circle  ==  NULL)  return (NULL) ; 

result  -  (char  ‘)  palloc(60); 

(void)  sprintf (result,  * (%g, %g, %g) • , 

circle- >center.x,  circle->center .y,  circle'>radius); 
return (result) ; 


char  circle_area_greater (circlel,  circle2) 
CIRCLE  ‘circlel,  ‘circle!; 

(  return (circlel->radius  >  circle2->radius) ;  } 


Figuur  3.2:  A  part  of  the  C  source  code  defining  the  new  type  circle 
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[a]  define  C  function  circle_in  (file  =  ‘circle. o", 

returntype  =  circle)  arg  is  (charl6) 

[b]  define  C  function  circle_out  (file  =  ‘circle. o‘, 

returntype  =  charl6)  arg  is  (circle) 

(cj  define  type  circle  (internal length  =  24,  input  =  circle_in, 
output  =  circle_out) 

[d]  define  C  function  circle_area_greater  (file  =  ‘circle. o’, 

returntype  =  bool)  arg  is  (circle,  circle) 

[e]  define  operator  >  (argl  =  circle,  arg2  =  circle, 

procedure  =  circle_area_greater) 

[f]  create  tutorial  (a  =  circle) 

[g]  append  tutorial  (a  =  ‘ (5, 1, 9) ‘ : :circle) 

[h]  append  tutorial  (a  =  ‘ (2, 2, 5) ‘: :circle) 

[i]  append  tutorial  (a  =  • (0, 1,7) ‘ : :circle) 

[jl  retrieve  (tutorial. all)  where  tutorial. a  >  • ( 0, 0, 8) • : rcircle 


Figuur  3.3:  The  Postquel  part  of  defining  the  new  type  circle 

center.y,  radius),  for  example,  (0,  0, 1):  the  unit  circle.  The  internal  r^resentation  detennines 
how  the  type  is  organized  in  memory,  just  as  in  the  programming  language  C.  Figure  3.2  shows 
the  C  code  that  defines  the  internal  representation  of  the  new  type  circle  and  some  C  fimctions  for 
it.  Assume  that  this  is  stored  in  the  file  circle .  c. 

The  functions  circle.in  and  circle.out  perform  the  conversions  between  external  and  inter¬ 
nal  representations.  There  is  also  one  operaror  function  for  this  type;  circle.area.greater, 
which  detennines  whether  the  area  of  the  first  circle  is  greater  than  the  area  of  the  second  one. 
After  compiling,  which  produces  the  object  file  circle. o,  Postgres  must  be  informed  about 
the  existence  of  the  new  type  and  its  functions;  first  the  conversion  functions  are  defined  (see 
Figure  3.3  queries  [a,b]).  then  the  new  type  (cj  is  defined,  and  finally  the  operator  ftmcticm  [d]  and 
it's  symbolic  representation  [e]  are  defined;  an  '*>”  sign.  Now  it  is  possible  to  create  relations 
with  circles  in  them  [f],  append  records  to  them  {g3i,i).  and  retrieve  the  circles  which  have  an  area 
greater  than  the  circle  (0,0,8)  [j].  The  result  of  the  last  query  (j]  is  of  course  the  circle  (5.1,9). 


3.4  The  R-tree 

The  Jt-iree  was  defined  by  Guttman  (12]  in  1984.  The  leaf  nodes  of  the  R-tree  contain  entries 
of  the  form:  (l/)bject-identifier),  where  obJect~identflier  is  a  pointer  to  a  data  olgect  and  /  is  a 
bounding  box  (or  Minimal  Bounding  Rectangle,  MBR).  The  iittenul  nodes  coittain  entries  of  the 
form:  (/.cWitf-potir/er),  where  cMM-pofnrer  is  a  poirrter  to  a  diild  node  and  /is  the  MBR  of  that 
diild.  The  maximum  number  of  erUtics  in  each  node  is  called  the  braacUng  factor  M  and  is 
chosen  to  suit  paging  and  disk  I/O  buffering.  The  insert  and  delete  algorithms  of  Outiman  assure 
that  the  tree  is  balmced  (all  leaf  nodes  are  on  the  same  level)  and  that  the  rtumber  of  etttries  in 
each  node  lies  between  m  and  Jl/.  where  m  <  Mfl  is  the  minimum  number  of  entries  per  node. 
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Figure  3.4  shows  an  R-tree  with  two  levels  and  M=4.  The  lowest  level  contains  three  leaf  nodes  j 

and  the  highest  level  contains  one  node  with  pointers  and  MBRs  of  the  leaf  nodes. 


Figuur  3.4;  The  R-trec 

The  infonnation  in  the  remainder  of  this  chapter  is  based  on  the  beta  version  of  the  R>tiee  in 
Postgies  and  is  provided  by  Mike  Olson[17].  The  R-tree  will  be  included  in  the  next  public 
release  of  Postgres  (version  2.1).  The  perfoimantx  tests  with  the  R-tree  of  Postgies  are  done  on 
DEC^ation  S000/2(X)  under  Ultrix  4.0.  The  use  of  the  R-tiee  is  similar  to  the  use  of  the  B-tree 
in  Postgies.  That  is,  one  can  use  the  Postquel  construct  define  index  to  define  an  index  on 
the  region  attribute  of  the  relation  testrel;  see  Figure  3  J  [a,d].  The  R-tiee  can  be  used  with 
system  type  box  and  the  following  operators; 


operator 

meaning 

a  «  b 

box  a  is  strictly  left  of  box  b 

a  &<  b 

a  is  left  of  b,  or  overlaps  b,  but  does  not  extend  to  the  right  of  b 

d  &&  b 

a  overiqis  b 

a  &>  b 

a  is  right  of  b,  or  overlaps  b,  but  does  not  extend  to  the  left  of  b 

a  »  b 

a  is  strictly  right  of  b 

a  e  b 

a  is  ctmtained  by  b 

a  ~  b 

a  contains  b 

a  ~=  b 

a  and  b  are  the  same  box 

The  relation  testrel  is  populated  (Figure  3.S  [b])  with  30,000 rectangles,  sides  mdom  between 
0  and  1000,  and  origitis  random  distributed  in  three  regions  10,000 in  (0, 0, 10000, 10000),  10,000 
in  (30000, 10000,  SOOOO,  30000),  and  10,000  in  (0, 0,  SOOOO,  SOOOO).  TMs  daU  set  U  chosen, 
because  it  is  r^resentative  for  map  data:  (Ejects  of  difiereni  sizes  and  a  population  density  that 
is  not  constant  over  the  whole  region.  Figure  3.5  [c]  d»ws  a  rectangle  overlap  query.  TMsisan 
impoitam  type  of  query,  because  it  is  used  to  generate  maps  on  the  rectangular  scteea  A  point 
query,  used  for  implementing  a  “irick”  tqieration,  can  be  formulated  by  taking  a  box  widi  equal 
diagonal  comer  ptunts. 

A  restrictive  spatial  query,  that  retrieves  up  to  100  otgects,  without  the  R-tree  takes  ihout  85 
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[a]  create  testrel  (region  =  box) 

[b]  append  testrel  /•  append  lots  of  tuples  */ 

[c]  retrieve  (testrel .all)  wliere  testrel . region  &&  ■  (98, 20, 9, 10)  ■ tljox 

[d]  define  index  testind  on  testrel  using  rtree  (region  box_ops) 


Figuur  3  Defining  an  R-tiee  index  for  the  relation  testrel 

seconds  using  a  sequential  scan.  Building  an  R-tiee  index  on  the  test  relation  with  30,000  objects 
takes  about  35  minutes.  However,  now  the  same  spatial  queries  nm  typically  a  few  hundred  times 
faster  using  the  index  scan.  As  the  size  of  the  relation  grows,  the  gain  of  the  index-scan  will 
become  laiger  and  larger  compared  to  the  sequoitial  scan. 

The  size  of  the  file  that  contains  the  testrel  is  3.0  Mb.  The  size  of  the  file  that  contains  the 
index  test  ind  is  6.7  Mb.  This  may  seem  a  tot  in  comparison  the  relation  testrel,  but  in  the 
case  of  GIS-application  with  tuples  that  have  polygon  attributes  varying  from  10  to  1,000  points, 
the  overhead  of  the  R-tree  is  quite  acceptable. 
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4  Implementation  of  a  GIS  on  top  of  Postgres 


We  have  built  the  GEO  system,  a  general  puipose  GIS  frontend  for  Postgres.  The  system  has  a 
“direct  manipulation  user-intefface,”  allows  us  to  implement  real  woild  GIS  systems,  and  allows 
us  to  experiment  with  the  user-interface  and  various  data  stnictuies  and  storage  tedmiques.  Some 
of  the  expected  applications  are:  electiiMiic  sea-maps  and  various  Command  and  Control  (C2) 
systems.  The  current  prototype  system  is  written  in  [24]  and  uses  the  ET^  [31, 32]  class 
library. 


4.1 

ET+'^  is  a  class  library,  written  by  Andre  Weinand,  Erich  Gamma  aix)  Rudolf  Marty  of  the 
University  of  2Uirich.  The  library  consists  of  a  Smalltalk-like  collection  of  classes,  just  as  Keith 
Gorien's  NIH'  [9]  class  library.  The  library  fiiither contains  graphic  user-interface  building  Modes 
in  a  manner  similar  to  Interviews.  In  contrast  to  NIH  and  Interviews,  ET^  contains  both  in  an 
integrated  design. 

An  executable  ET^  program  attaches  automatically  to  the  actually  used  Window  System:  XI 1 
[8,  20],  News,  or  SunView.  The  visual  jqrpearance  of  the  program  is  the  same  in  each  of  the 
window  systems.  The  ninning  program  also  behaves  the  same.  This  is  possiMe  because  ET^ 
defers  calling  of  the  aaual  underlying  window  syston  till  the  lowest  levd  of  drawing  lines  and 
pixels.  The  fonts  used  and  the  layout  and  interactim  of  the  user-interface  building  blocks  give 
it  a  Macintosh  appearance.  This  is  no  coinddetKe,  since  the  authors  based  many  of  their  ideas 
on  the  MacApp  framework  [21].  In  the  context  of  user-interface  design,  ET^  has  the  advantage 
over  most  other  graphic  user-interface  toolkits  (SunView,  XView,  Xt-widgets)  that  it  enaMes  the 
designer  to  change  every  aspect  of  the  visual  appearance  and  “look  and  feel”  by  overruling  the 
appropriate  methods  in  the  class  inheritance  framework.  This  allows  us  to  implement  our 
own  ideas  of  the  ideal  “look  and  feel”  without  rewriting  most  of  the  toolkit.  Other  features  which 
can  be  built  with  with  minimal  programming  effort  using  ET*^ ,  are: 

•  A  Smalltalk-like  class  hierarchy  and  source  browser. 

•  Dynoimc  loading. 

•  Generic  objea  HO. 

•  Citfan<f/*astebelweeniqiplicmions. 

•  Multiple  paner.  a  wfaidow  can  be  split  into  two  or  four  sections  wWch  rfiow  difleicnt 
portiems  of  the  image  in  the  window.  Each  pane  has  individual  scrollban. 


'The  orismal  mm  of  NIH  WH  OCK. 
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Figuur4.1;  The  user-interface  ofthe  GEO  system,  showing  a  oamposedqiieiy 

•  Generation  of  a  PostScript  file,  containing  the  contents  of  a  window.  The  output  does  not 
result  in  a  bitmap  image,  but  uses  the  fiill  resolution  of  the  PostScript  device. 


4.2  The  GEO  system 

The  GEO  system  has  the  following  femures; 

1.  A  special  purpose  dau  ttncmrt  Ui^tr  (conenlly  our  own  R-tiee  impleiiiesdalion).  The 
layer  allows  the  dis|day  of  geographic  dsu  not  sioied  in  Posigics  because  query  ssking 
fonclionalityisnotieqoiicd.  This  special  purpose  dafobose  is  fast  and  csn  be  used  for  map 
background  data,  such  as  landmass  contour  data,  riveis,  cm.  When  the  R-tree  becomes 
available  within  Postgies  and  proves  to  be  efiBcieni,  then  the  special  purpose  data  structure 
layer  ufll  be  removed. 

2.  A  Posrgner  foyer  wfakfa  enables  die  endHiser  to  fonaslaieqnaies  by  vshig  a ''dhectman^r- 
elation  oser-interfooe,’' hi  oontrast  to  typing  posiqud  queries.  In  case  of  geographic  da^ 
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the  results  of  these  queries  are  displayed  as  labeled  points  or  polylines  on  the  mtq>.  It  is 
also  possible  to  present  the  results  in  a  tabular  fonnaL  Further,  this  module  allows  the  user 
to  modify  Postgres  data  with  a  “direct  mampulation  user-interface.”  The  Postgres  layer  is 
discussed  in  more  detail  in  Sectim  4.3. 

3.  An  OHHotation  layer.  This  layer  has  drawing  capabilities  as  found  in  many  drawing  programs 
(text,  polylines  with  arrows,  polygons,  etc.)  and  allows  the  exchange  of  information  between 
the  users  of  different  networked  woikstations.  Users  can  view  and  edit  different  annotation 
overlays  and  look  at  overlays  created  by  others:  a  briefing. 

4.  The  c^ability  of  making  changes  in  Postgres  data  visible  without  specific  user  actions 
makes  it  possible  to  create  dynamic  displays  with  moving  and/or  changing  objects;  see 
Section  4.4. 

3.  The  possibility  to  customize  the  GEO  system  by  hiding  features  and  options  of  the  system 
which  are  not  needed  for  a  specific  a{q>lication  or  adding  special  functions  or  icons  by 
editing  metadata  in  the  Postgres  database. 


In  the  current  system  only  the  features  1-3  have  actually  been  implemented,  while  features  4  and 
5  are  underdevelopment. 


4.3  The  Postgres  Layer 


In  the  Postgres  layer,  one  first  specifies  the  database  table  and  the  attributes,  which  should  be 
retrieved.  Then,  the  selection  criteria  can  be  :q)ecified  by  building  a  tree  which  rqnesents  the 
"where”  clause  of  the  query.  The  resulting  tree  is  the  grqihic  r^resentation  of  the  parse  tree  of  a 
“where”  clause  in  the  Fristquel  query  language.  The  operations  that  the  user  can  apply  to  die  nodes 
in  this  tree  are  the  productions  in  the  context  free  grammar  describing  a  Postquel  “where”  dause, 
although  the  end-user  is  probably  not  (and  should  not  be)  aware  of  this  underlying  prindple.  This 
guarantees  that  all  possible  “where”  clauses  can  be  spedfied.  This  gnqthic  tree  building  has  two 
advantages: 


1.  The  graphic  tree  representation  makes  inherently  complex  boolean  queries  easier  to  nnder- 
startd  (parse)  for  the  end-user. 

2.  It  is  not  possible  to  fonnulate  queries  that  result  in  a  syntax  enoc  The  system  checks  the 
parameter  types  of  functions  and  operators  and  guides  the  user  by  die  sdectkm  of  the  actaal 
parameters.  For  examide,  if  the  user  chooses  a  fimction  like  distance,  dtenhecanonly 
sded  table  attributes  diat  are  of  the  oomd  type  Cpoint2). 

Most  enors  made  by  the  useis,  however,  are  senuntic  and  not  syntactic.  This  oo^iidve  mpeat 
deserves  future  research.  The  piodnctiont  (rewriting  tides)  are: 
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•  Choosing  a  function  or  operator,  which  is  used  for  implementing  the  restriction. 

•  Choosing  a  table  attribute,  which  is  an  operand  for  a  selected  function  or  (iterator.  Thismay 
be  an  attribute  of  another  table  (implementing  joins)  or  of  a  previously  composed  queiy. 

•  Choosingaconstantfiromarangeoftypes(bool,int4,point2,polyline2,polygon2, 
text,  etc.)  as  operand.  The  availaUe  types  are  retrieved  from  the  Postgres  system  tables. 

•  Choosing  a  boolean  operator  and,  or,  and  not.  This  enaUes  the  user  to  create  more 
complex  queries. 


The  query  in  Figure  4.1,  which  retrieves  (and  displays)  all  objects  of  relation  ol  which  have  a 
height  less  than  20  and  whose  distance  from  at  least  one  of  the  objects  in  relation  o2  is  less  than 
100  kilometers,  could  be  composed  by  the  following  sequence: 


1.  Choose  operator  <  (float4,  floaC4) 

2.  Choose  attribute  ol. height 

3.  Choose  constant  and  enter  20 

4.  Choose  boolean  AND 

5.  Choose  operator  <  (floats,  floatS) 

6.  Choose  operator  <=>  (point, point) 

7.  (3K>ose attribute  ol.loc 

8.  Choose  attribute  o2.1oc 

9.  Choose  constant  and  enter  100. 


Note  that  constants  are  automatically  cast  to  the  conect  type,  that  is,  £loat4  in  line  3  and 
floats  in  line  9.  However,  attributes  can  only  be  selected  if  they  are  of  the  coirect  type,  that  is, 
float4  in  line  2  and  point2  in  line  7  and  8. 

Of  coune,  the  same  queiy  could  be  composed  by  applying  the  productioas  in  ■wiher  order  For 
example,  start  with  operator  AMD,  chooM  operator  <  (floats.  floatS),  etc.  The 
only  restriction  in  the  current  version  is  that  the  opemtorhas  to  be  selected  befitre  the  operands  can 
be  chosen.  So  one  has  to  know  the  type  of  the  openssis.  The  next  veision  win  apply 
typecasting  (e^.,  the  system  chooses  between  <  (int4,int4)  atid<  (floats,  floats)) 
and  will  allow  the  user  to  select  an  operand  fim  and  insert  an  operator  later 
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la]  create  ol  (loc  =  point2.  neune  x  charl£,  height  =  float4)  ! 

lb]  create  changes  (relnome  >  charlS,  changedoid  x  oid)  J 

[c]  define  rewrite  rule  olappend  is 

on  append  to  ol 

do  append  changes ( relnone  =  *ol*,  changedoid  s  current. oid) 

[d]  define  rewrite  rule  oldelete  is 

on  delete  to  ol 

do  append  changes (relnome  x  *ol*.  changedoid  s  current. oid) 

[e]  define  rewrite  rule  olchonge  is 

on  replace  to  ol.noms 
where  current. height  >20 

do  append  changes (relnome  a  *ol*,  changedoid  =  current. oid) 

Figuur  4.2:  The  relation  changes  and  three  triggers  on  dianges  to  relation  ol 

4.4  Dynamic  Display 

The  dynamic  display  (upability  can  be  implemented  by  means  of  Postgres  asynchronous  poitals 
which  notify  applications  when  a  lule  has  fired.  The  GEO  system  could  define  rales  on  the 
relations  of  interest  (relations  that  should  continuously  be  updiued  on  the  display)  and  will  be 
notified  by  asynchronous  poitals  when  the  rales  fire. 

An  alternative  is  to  have  a  Postgres  relation  containing  the  changa  made  to  the  relations  of 
interest,  and  user  defined  Postgres  ruler  that  append  data  to  this  relation;  see  Figure  4.2  (a,b].  The 
advantage  of  this  ^tproach  is  that  the  user  has  more  control  over  the  type  of  updates,  which  should 
cause  a  update  of  die  display.  This  is  due  to  the  fkt  that  he  (and  not  the  GEO  system)  defines  the 
rales  with  the  appropriate  restrictions.  TheGEOsystemjusthastomonitorthe  changes  relation 
by  means  of  a  defii^g  a  single  rale  on  this  relation.  Examples  of  three  user  defined  triggeis  that 
notify  GEO  system  of  changes  to  relation  ol  are  given  in  Figure  4.2  [c4,e].  Rules  4.2  [c]  and 
4J2  (d]  cause  new  and  deleted  tuples  to  be  (ttn)displayed.  Rule  4.2  [e]  causes  objects  whose  name 
is  changed,  to  be  redisplayed  only  when  they  are  taller  than  20  meters. 

The  GEO  system  should  remove  tuples  in  changes  which  are  olderdian,  for  exanple.1  minute. 

The  tuplea  cannot  be  removed  immediaieiy  after  being  prooesacd  becauae  more  than  one  GEO 
system  cmild  be  monitoringthe  changes  idaiion. 
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5  Conclusion 


Postgies  offen  aevenl  mechanisms  for  developing  advanced  GISs  that  have  not  been  exploited  in 
GEO  system  yet  For  example,  Postgies  offen  tystoric  data  and  venkms  of  rdadons.  There  are 
types  of  GISs  in  which  this  plays  an  important  role:  C2  systems,  GISs  monitoring  of  enviionmem, 
or  GISs  visualizing  census  data,  tt  is  obvious  dial  these  kind  of  applications  will  beneft  from 
the  automatic  stoiage  of  historical  data.  For  example,  there  is  no  extra  codiitg  required  (in  the 
application)  to  solve  the  query: 

ratriava  (t.nama,  t.ainhabitanCs)  from  t  in  town[*l  January  1980']. 

Applications  that  require  geographic  data  at  multiple  scales  are  another  example  where  the  novel 
mechanisms  of  Postgies,  mi^  offer  solutions.  We  are  developing  a  system  that  avoids  storing 
redundant  data,  i.e.,  do  not  simply  store  a  separate  map  for  each  scale.  This  system  mi^  benefit 
from  a  combination  of  tedmiques: 

•  Our  intended  implementation  of  the  Reactive-tree  (28]  within  Postgies. 

•  The  use  of  rales  to  derive  small  scale  m^>s  of  large  scale  databases  [IS]. 

•  Functions  within  Postgies  are  usefiil  for  the  implementation  of  procedural  map  geiKializa- 
tion  techniques.  For  example,  associated  with  a  polyline  or  polygon  is  a  liiK  generalization 
algorithm  to  reduce  the  number  of  points  used,  when  woiking  with  small  scale  maps. 

•  C^posite  type  attributes  (lelation,  (set  of)  tuple)  can  be  used  for  multi-scale  lepiesentation 
of  a  single  ol^ect  These  composite  type  attributes  allow  references  to  other  t^es,  which 
describe  the  refinement  of  objects  at  a  larger  scale  map  [3]. 


Another  important  research  area  deals  with  the  cognitive  (user-interface)  aspects  of  a  GIS.  Tests 
with  real  users  are  necessary  to  determine  what  a  “good”  graphic  iideiface  to  GISs  dwuld  look 
like.  It  is  clear  that  the  diiea  use  of  Postquel  by  end-usen  is  not  optimal. 
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